Suffix-tree analyser (STAN): looking for nucleotidic and peptidic patterns in chromosomes

نویسندگان

  • Jacques Nicolas
  • Patrick Durand
  • Grégory Ranchy
  • Sébastien Tempel
  • Anne-Sophie Valin
چکیده

SUMMARY We have developed STAN (suffix-tree analyser), a tool to search for nucleotidic and peptidic patterns within whole chromosomes. Pattern syntax uses a string variable grammar-like formalism which allows the description of complex patterns including ambiguities, insertions/deletions, gaps, repeats and palindromes. STAN is based on a reduction to multipart matching on a suffix-tree data structure and can handle large DNA sequences, whether assembled or not.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth

Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...

متن کامل

Constructing Chromosome Scale Suffix Trees

Suffix trees have been the focus of significant research interest as they permit very efficient solutions to a range of string and sequence searching problems. Given a suffix tree that encodes a particular string, it is possible to solve problems such as searching for a specific pattern in time proportional to the length of the pattern rather than the length of the string. Suffix trees can also...

متن کامل

Space-efficient K-MER algorithm for generalized suffix tree

Suffix trees have emerged to be very fast for pattern searching yielding O (m) time, where m is the pattern size. Unfortunately their high memory requirements make it impractical to work with huge amounts of data. We present a memory efficient algorithm of a generalized suffix tree which reduces the space size by a factor of 10 when the size of the pattern is known beforehand. Experiments on th...

متن کامل

Space-efficient K-mer Algorithm for Generalised Suffix Tree

Suffix trees have emerged to be very fast for pattern searching yielding O (m) time, where m is the pattern size. Unfortunately their high memory requirements make it impractical to work with huge amounts of data. We present a memory efficient algorithm of a generalized suffix tree which reduces the space size by a factor of 10 when the size of the pattern is known beforehand. Experiments on th...

متن کامل

Efficient Discovery of Proximity Patterns with Suffix Arrays

We describe an efficient implementation of a text mining algorithm for discovering a class of simple string patterns. With an index structure, called the virtual suffix tree, for pattern discovery built on the top of the suffix array, the resulting algorithm is simple and fast in practice compared with the previous implementation with the suffix tree.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 24  شماره 

صفحات  -

تاریخ انتشار 2005